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ABSTRACT 

We propose a new method to recover the cosmological initial conditions of the presently 
observed galaxy distribution, which can serve to run constrained simulations of the Local 
Universe. Our method, the Reverse Zeldovich Approximation (RZA), can be applied to radial 
galaxy peculiar velocity data and extends the previously used Constrained Realizations (CR) 
method by adding a Lagrangian reconstruction step. The RZA method consists of applying 
the Zeldovich approximation in reverse to galaxy peculiar velocities to estimate the cosmic 
displacement field and the initial linear matter distribution from which the present-day Local 
Universe evolved. We test our method with a mock survey taken from a cosmological simula- 
tion. We show that the halo peculiar velocities at z = are close to the linear prediction of the 
Zeldovich approximation, if a grouping is applied to the data to remove virial motions. We 
find that the addition of RZA to the CR method significantly improves the reconstruction of 
the initial conditions. The RZA is able to recover the correct initial positions of the velocity 
tracers with a median error of only 1 .36 Mpc//i in our test simulation. For realistic sparse and 
noisy data, this median increases to 5 Mpc//z. This is a significant improvement over the pre- 
vious approach of neglecting the displacement field, which introduces errors on a scale of 10 
Mpc//i or even higher. Applying the RZA method to the upcoming high-quality observational 
peculiar velocity catalogues will generate much more precise constrained simulations of the 
Local Universe. 
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1 INTRODUCTION 

1.1 Constrained simulations 

In the concordance ACDM model of structure formation, the cos- 
mic web of our Universe evolved through gravitational interactions 
from initial conditions (ICs) which constitute a linear Gaussian ran- 
dom field of density perturbations dKolb et alJI 1990h . These initial 
conditions completely define the assembly history of the contem- 
porary large-scale structure (LSS) of the Universe, and with the 
knowledge of the theory of gravity, the galaxy formation process, 
and the basic cosmological parameters, the formation and evolution 
of the LSS can be modelled. Today, the most successful method to 
do so are numerical jV-body simulations (e.g. Springe fet al.ll2005l 
120081 : iTevssier et aHl2009l : lKrvpin et alj|201 These simulations 
reached an impressive dynamical range in mass and length scales 
and can be applied to study many different aspects of cosmologi- 
cal structure formation. It is however not straightforward to link the 
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insights gained by cosmological simulations to observations of the 
Universe. The region that is best studied and accessible observa- 
tionally is the Local Universe, i.e. the Local Group and its immedi- 
ate large-scale environment. A very attractive m ethod is provided 
by th e constrained realizations (CR) algorithm (Hoffman & Ribak 
Il99lh . which allows to impose observational data as constraints on 
the ICs. The simulations obtained from such constrained ICs are 
able to yield structures which can closely mimic those in the actual 
Local Universe. These constrained simulations provide an ideal nu- 
merical laboratory to study various aspects of structure formation 
in the Local Universe, from the dynamics of the nearby massive su- 
perclusters down to the formation, evolution, and properties of the 
Local Group itself and its satellite galaxies. 

The key requirement for running constrained simulations is 
to generate a good estimate of the Gaussian initial conditions. The 
constrained simulations method implements the Bayesian approach 
in order to estimate and reconstruct the most probable field given 
the assumed Gaussian PDF and some observational data, filtering 
out statistical noise from observational errors. 
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A very useful al gorithm is p r ovided by the forma lism of the 
Wiener Filter (WF. Iwienej d 1 9491) . IZaroubi et al.ldl 995k ). The WF 
reconstruction is then augmented with rando m small-scale per- 
turba tions using the Hoffman-Ribak algorithm dHoffman & Ribakl 
Il99ll) to yield a constrained realization of ICs with the required 
statistical properties, i.e. enough power on all scales resolvable 
by the simulation. Two directions can be taken for getting appro- 
priate input data: using either the redshift positions or the pecu- 
liar ve l ocities of observed galaxies . Earl y attempts bylKolatt et"al] 
dl996h . iBistolas & Hoffmanl dl998l) . and llVIathis etafl d2002h used 
redshift catalogues to obtain an estimate of the underlying field, 
taking this field backwards into the linear (Gaus sian) regime us- 
ing t he Eulerian Zeldovich-Bernoulli equation dNusser & Dekell 
1 1992b . and then generating initial conditions for simulations with 
th e CR algorithm . More recently, this approach w as enhanced 
by Lavaux (2010) by using the MAK reconstruc tion (Frisch et al. 
120021; iBrenier et al.ll2003l;lMohavaee et al.ll2003h . and by iKitaurii 
d2012l) . who performed the Gaussianization by Hamitonian sam- 
pling with a Gaussian-Poisson model. 



1.2 CLUES simulations from peculiar velocities 

Although the redshift space approach can already produce a rea- 
sonable estimate of the largest observed structures, using galaxy 
peculiar velocities for input data has several benefits. Catalogues of 
galaxy peculiar velocities (more precisely, their radial components) 
are constructed from galaxy redshift me asurements and i ndepen- 
dent estimates of their distance (see e.g. iTullv et al.ll2009h . Direct 
distance measurements do not suffer from redshift distortions, and 
the derived peculiar velocities provide a direct tracer of the underly- 
ing gravitational potential and therefore the total mass distribution 
without suffering from the galaxy bias. Statistically, the peculiar ve- 
locity field at z = is much closer to a Gaussian distribution than 
the density field traced by galaxy redshift positions, which facili- 
tates Bayesian reconstruction. Moreover, peculiar velocities feature 
strong large-scale correlations and therefore allow to extrapolate 
the reconstructed field into regions not covered by measured galax- 
ies, such as outside of the data zone, and inside the Zone of Avoid- 
ance which is obscured by the Milky Way disk; as a result, recon- 
structions using peculiar velocities are typically much less sensitive 
to data that is sparse, incomplete, and distributed in a statistically 
inhomogeneous way. This approach of using radial pe culiar veloc- 
ities f or Bayesian reconstruction was pioneered by IZaroubi et al.l 
( 1999), who laid out the necessary theoretical framework, and 
the first high-quali ty simulations based on it were conducted by 
Klypin et al]|2003h. uilding upon this work, the CLUES project 



Gottlober et al. 2010) is dedicated to construct simulations that 
reproduce the Local Universe and its key ingredients, such as 
the Local Supercluster (LSS), the Virgo cluster, the Coma clus- 
ter, the Great Attractor (GA) and the Perseus-Pisces superclus- 
ter. The observational data is provided by a collaboration with the 
Cosmicflows program dCourtoisl2011allb[|Courtois & Tullvll2012bl : 



lTullv&Courtoisll2012l) , whose goal is to determine galaxy dis- 
tances for 30,000 galaxies in the Local Universe with systematic 
errors below 2%. CLUES simulations are used to study a vari- 
ety of aspects of the Local Universe. This includes the question 
whether our own Local Gr oup is a "typical" object an d studies of 
its mass accretion history (Forero-Romero et al. 201 1), the unusu- 
ally cold local Hubble low I Martinez- Vaqu ero et al.l2009l) . and the 
abundance and spatial distribution of satellite galaxies in the Lo- 
cal Gro u p dLibeskind et al.ll20ld : lklimentowski et alj20ld : lKnebel 
l2011b| [al: lDiCintioetal.ll20121) . 



CLUES simulations have a lso been applied to investigate 
the possibility of WDM models dYepes et al Jl2009T) and to simu- 
late possible observations of dark matter dCuesta et al.ll201 ll) . The 
tremendous increase in data quality of peculiar velocity observa- 
tions in recent years presents a serious challenge for the simulators: 
the present methods relying on this data to generate constrained ini- 
tial conditions and constrained simulations have significant limita- 
tions and need to be improved as well, in order to optimally utilize 
the wealth of additional information contained in the new data sets. 
This challenge provides the main motivation for this work. 

The main drawback of current CLUES simulations is that, 
while they are able to reproduce massive clusters like the Virgo, 
the Great Attractor (GA), Coma, and Perseus-Pisces, they do not 
directly constrain the structure on scales smaller than such mas- 
sive clusters, which are therefore dominated by the random compo- 
nent added by the CR algorithm (although all objects that emerge 
on these smaller scales are still located in the correct large-scale 
environment). The reason is that the CR formalism is formulated 
completely within the formalism of Gaussian fields, thereby as- 
suming that the linear theory of density perturbations is valid on 
all scales. The data used as input is however observed at present 
time, after having undergone the highly non-linear structure for- 
mation process, and the linearity assumption is only valid down 
to a certain length scale. A particularly strong error source is the 
cosmic displacement field: the galaxies observed today are located 
at different positions than their progenitors in the linear regime 
due to their motion that arises from the large-scale gravitational 
field. At z = 0, the average amplitude of this displacement field is 
ss 10 Mpc/fe, severely limiting the validity of the linear formalism 
on scales smaller than this. Accounting for this non-linearity must 
therefore be done by a reconstruction of the displacement field and 
the initial distribution of the data in the linear regime, given the fi- 
nal distribution. This task is commonly termed Lagrangian recon- 
struction and is a highly non-trivial undertaking due to the under- 
lying strongly non-linear dynamics. Methods developed so far usu- 
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until now exclusively focused on galaxy redshift surveys, and there- 
fore their accuracy is limited by t he high non-lineari ty and strong 
observational biases of such data dLavaux et al.l|200 8j); in addition, 
they are all very expensive computationally, and they do not make 
use of the known statistical properties of the initial conditions that 
are to be reconstructed, thus lacking self-consistency with the as- 
sumed cosmological model. 



1.3 Outline 

This work is the first part of a serie of three papers by Doumler 
et al. In this series, we present a novel self-consistent method for 
Lagrangian reconstruction that is designed for the application to 
peculiar velocity data, which we call Reverse Zeldovich Approxi- 
mation (RZA). We test the validity of this method using mock data 
from a given cosmological simulation, serving as the test Universe. 
Applying the RZA does not require expensive computations, and 
at the same time allows to generate a good estimate of the cosmo- 
logical initial conditions which can then be used to run constrained 
simulations. The resulting simulations have a significantly higher 
accuracy compared to those run from ICs generated with the previ- 
ous method (without Lagrangian reconstruction). In this first paper, 
we present the method itself, how to apply it to realistic observa- 
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tional data, and how to use it to construct ICs for constrained sim- 
ulations. 

This paper is organised as follows: In Section[2] we review the 
CR method, the problem of Lagrangian reconstruction, and intro- 
duce the idea of the RZA. In Section^ we verify the validity of the 
RZA on a cosmological simulation. Section [4] describes how the 
RZA can be applied to realistic data. Here, we test its performance 
using mock data drawn from the same simulation, but with obser- 
vational errors, a relatively small data volume, high incompleteness 
and knowledge of only the radial component of the haloes' peculiar 
velocities. We then describe in Section[5]how the RZA can be used 
to reconstruct the cosmological initial conditions of the input data. 
Our findings are summarized and discussed in Section|6] 



2 THEORETICAL FRAMEWORK 
2.1 The Wiener Filter 

Within the linear theory of Gaussian random fields, the Wiener Fil- 
ter (WF) reconstruction method is an established tool for the pur- 
pose of reconstructing the underlying density (6) and peculiar ve- 
locity (v) fields from sparse and noisy data s ets ([Rvbicki & Pressl 
19921: fa sher et a l]ll995l;lzaroubi et al. 199 51 lErdogdu et al.ll20od : 
Hoffmarj 120091 : iKitaura et al] 120091 ; ICourtois et al.ll2012h . In the 
case of peculiar velocity data, we are given the radial component v r 
of the three-dimensional peculiar velocity field v sampled at some 
discrete positions r. This data is additionally corrupted with obser- 
vational errors e, so that the input consists of a set of datapoints or 
"constraints", T = (C,| with C, = c, + e,, where c, = v rj . Here, a 
data point is separated into the actual signal that is contributed by 
the underlying error (c,) and the statistical observational error (e). 
Two basic underlying assumptions are made here, one is that the er- 
rors are statistical and all the systematic errors have been accounted 



for and that the errors and the underlying d ynamical fields are sta- 
tistically independent (Zar oubi et al.ll 1993) . For this kind of data, 
the Wiener Filter is used in its cartesian real space formulati on; for 
a full description of this formalism see IZaroubi et all dl999l) . The 
Wiener Filter mean field is given by 



/<5 WF (r) = (S(r)a) (CCj)-' Cj 
D w (r) = (v(r)c,) (CCj)-' Cj 



(1) 
(2) 



where (C,C,) is the data autocorrelation matrix of the data, and 
{S(x)cj) and (h(jc)c;) are the cross-correlation matrices of the data 
with the fields to be reconstructed. One should note that the cross- 
correlation matrices do not correlate the noise, i.e. the statistical 
errors, with the signal and therefore the cross-correlation with c t is 
written. The data auto-correlation matrix is written: 



where 



{CCj) = (acj) + ajSij, 



(3) 



(4) 



Equation [4] assumes that the errors of different data points are sta- 
tistically independent. The error matrix needs to be tailored to the 
data base at hand. 

Because we are dealing with Gaussian random fields, the cor- 
relation matrices which describe the underlying fields are com- 
pletely defined by the power spectrum P(k) of the prior model. It 
can be shown that the WF mean field is the optimal estimator for 
Gaussian random fields. It is equivalent to the minimal variance 
estimator and the Bayesian conditional mean field (most probable 



field given the data and the prior model). The data c, describes only 
one component = ■ v, where = (r - ro)/|r - fol is the unit 
vector along the radial direction with respect to the observer posi- 
tion ro. This requires to compute the correlation functions (Sv^) and 
(v/jVy) for arbitrary e^, e v , which is given by 



mr'^ir' +r))=e,-{S(r')v(r' + r)) , 
(tVCr'KC/ + r)) = e„ • <i>(r') v(r' + r)> • e v , 

where the density-velocity correlation vector is given by 



(6(r')v(r' + r)) a 



(2. 



aj_ r 



-ikg 

k 2 



P(k)e ikr dk 



and the velocity- velocity correlation tensor is given by 
{v(r')v(r' + r)) a 



(2tt) 3 X \ V I 



(5) 
(6) 



(7) 



(8) 



where a,p e {x,y,z} are the three cartesian components, and P(k) 
is t he cosmologic a l pow er spectrum of the assumed prior model. 
See IZaroubietalJ l l 19991) for more details on how to evaluate ex- 
pressions and ((S). 

The formalism presented here does not account for the fact 
that the observed peculiar velocities, used here as constraints, sam- 
ple a non-linear velocity field. Peculiar velocities of galaxies out- 
side the cores of rich clusters constitute a good proxy to the lin- 
ear velo city field in the standard m odel of cosmology. We adopt 
here the Bistolas & Hoffmanl i 19981) partial remedy for their quasi- 
linearity. This consists of adding a constant term to the diagonal 
of the error matrix, aimed at making the data statistical compatible 
with the prior model. In the case where the assumed prior model 
is 'truly' the model of the universe, that the data does not deviate 
from linear theory and that the errors are estimated correctly, then 
the reduced chi-squared of the data, x 2 , should be very close to 
unity, where 



X' 



M 



(9) 



and M is the number of data points. In all observational data bases 
used before within the CLUES collaboration and the mock cata- 
logs constructed here the resulting reduced chi-squared has always 
been found to be larger than unity, indicating an excess of power 
over what i s predicted by the prior mo del within the linear regime. 
Following lBistolas & Hoffmarj i f 19981) we augment the error matrix 
with a constant a NL term, 



<£,£,) = (07 + a NL ) Sjj 



(10) 



The value of a NL is tuned so as to equate^- 2 = 1.0. 

As a part of this work, we developed the numerical software 
package ICeCoR^], a highly efficient parallelized code written in 
C++. One part of ICeCoRe's functionality is the ability to effi- 
ciently compute the WF mean field for very large datasets. This 
includes correctly inverting the MxM data auto correlation matrix 
(where M = \T\ is the number of datapoints) and then evaluating the 
WF mean field on a full three-dimensional cubic grid with high res- 
olution. The matrix inversion is performed by using the Cholesky 
decomposition method, whose main advantage is its high numer- 
ical stability. The computational cost of the inversion scales with 
0(M 3 ), and the maximum rank M of the matrix that can be inverted 
is limited in principle only by the available memory. For M = 4000, 
the inversion takes only a couple of seconds in serial. For matrices 
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larger than M » (1 - 2) x 10 4 , it is recommended to switch to 
parallel machines. For M = 5 x 10 4 , the inversion takes around 
60 minutes with eight OpenMP threads on two quad-core 2.4 GHz 
Intel Xeon processors, with 18 GB of memory required for the in- 
version. We successfully tested ICeCoRe for up to M = 1(P data 
points and have not encountered any issues of numerical stability. 
ICeCoRe also contains a number of utilities to manipulate the input 
data, to vary the implementation details of the WF algorithm, and 
to post-process the obtained three-dimensional fields. 



2.2 The Constrained Realizations algorithm 

At the heart of the CR method (in its linear-theory form) lies 
the idea to use the Gaussian field reconstructed with the WF as 
an estimate of the cosmological initial conditions underlying the 
data. However, while the WF is the optimal estimator for Gaus- 
sian random fields, it is also a very conservative estimator. It 
tends towards the unconstrained mean field (i.e., the null field) 
in regions not sampled by data and if the data is strongly cor- 
rupted by noise. It is therefore not power-preserving: the solution 
will lack power on scales not covered by the constraints. How- 
ever, the initial conditions required to run a cosmological sim- 
ulation must be a Gaussian random field realization of the as- 
sumed power spectru m P(k) at all scales resolved by the simula- 
tion. The CR method fBertschingedll987l;lBinnev & Ouinnlll99ll: 
Hoffman & Ribald 1 1 99 lL 1 19921 ; Ivan de Weygaert & Bertschingen 
19961 : iPrunet et al.l I2008T) provides a way to compensate for the 
missing power by adding fluctuations from an independently gen- 
erated random realization (RR). T he optimal exact algorith m for 
generating a CR was discovered bv lHoffman & Ribakl7l99lh . We 
first generate a random realization, S RR , and then "observe" it to 
get a set of "mock constraints" T = {C, j. The C, constrain the same 
quantities at the same positions as the original data C, , but the val- 
ues are drawn from S RR instead. Then, a constrained realization is 
generated with 

S CR (r) = 6 RR (r) + (S(r)c t ) {C.CjT 1 (Cj - Cj) , (11) 
v CR (r) = v RR (r) + (v(r)c i )(C i C J )- > (C J -C J ) . (12) 

In regions where the data c, are dense and accurate, such a CR will 
be dominated by the data and tend towards the WF mean field, i.e. 
the most probable linear field given the data and the P (k). Con- 
versely, the result will be dominated by the random component <5 RR 
in regions not sufficiently constrained by the c,, i.e. where they are 
sparse, noisy, or not present at all. In general, the result will show 
a smooth transition between both regimes and produce a Gaussian 
random field that is statistically homogeneous, obeys the assumed 
prior model P(k), and can therefore be used for cosmological ICs. 
An efficient implementation of this CR algorithm is provided in 
our ICeCoRe code. In order to start an iV-body simulation from 
S^, we merely have to linearly scale it to the chosen starting red- 
shift Zi„i t and to sample it with A' discrete particles (for exampl e 
with the established Zeldovich method, see lEfstathiou etail l 985 ). 
This is the basic method how CLUES simulations have been set 
up until now, aside fr om some practical implementation issues (see 
iGottlQberetaifeoiOh . 

Since the WF operator is mathematically very similar to the 
CR operator, except that the latter adds the RR terms, in the fol- 
lowing we will refer to the general algorithm as WF/CR. 



2.3 The Zeldovich approximation and Lagrangian 
reconstruction 

The linear theory of structure formation describes the density 
and peculiar velocity fields as time-independent fields except for 
the cosmology-dependent growth factor D and growth rate / = 
d ln(£>)/d ln(a), considering only the growing mode, 

6(x,z) = D(z)S (x) , (13) 

u(x,z) = -afV- l S(x,z) , (14) 

where we use comoving coordinates x = r/a, u = Ax I At I a. Moving 
beyond the linear theory of density perturbations, the time evolu- 
tion of the cosmic density distribution can be described in the for- 
malism of Lagrangian perturbation theory. If the initial conditions 
are sampled by some tracers at initial (homogeneously distributed) 
Lagrangian positions q, then these positions will change as a result 
of motions induced by the cosmic gravitational field, and the actual 
positions x(z) at any redshift z are given by 

x(z) = q(x) + #(x, z) , (15) 

and as a result the density and velocity fields will change as well. 
The deviation of these fields from the initial conditions as they 
evolve forward in time can therefore be described in terms of the 
displacement field if/(z)- Here, the first-order approximation has 
proven to be very useful and surprisingly accurate well into the 
quasi-linear regime, namely that the displacement field is itself con- 
stant except for the growth factor D and proportional to the peculiar 
velocity, 

i/,(x,z) = D(z)if> (x) = -D(z)V- > S (x) , (16) 
u(x,z) = afif,(x,z) . (17) 

This is the famous Zeldovich approximation JZeldovichl Il970l: 
IShandarin & Zeldovich! [l989l) . Considering the next term, i.e. the 
acceler ation, leads to second-order Lagrangian pe rturbation theory 
or 21pt ( iBuchert etai1ll994l; iBouchet et al.lll995h : this can be fur- 
ther expanded to higher orders. 

While Lagrangian perturbation theory is doing well in describ- 
ing the evolution forward in time until non-linearities set in, the 
situation becomes much more complicated if we want to do the 
reverse, i.e. reconstruct the initial field Soiq) from a sampling of 
the evolved density or velocity field at z = 0. This is equivalent 
to tracing the field back in time from z = to some z init where 
6(x) ~ 6(q), at which point we would have obtained the cosmo- 
logical initial conditions we are looking for. In other words, we 
need a mapping of Eulerian coordinates x = r at z = to the 
corresponding Lagrangian coordinates q, that is, we need to ob- 
tain the cosmological displacement field tfr in Eq. dl5t at z = 0. 
This is an extremely complex task because cosmic structure for- 
mation and evolution is a highly non-linear process. Even if we 
could reconstruct d(r) at z = to arbitrary precision (which is of 
course impossible), we could not just integrate the equations of 
motion back in time, such as by running a cosmological Af-body 
simulation backwards, to recover S(q) at Zmit- Although gravity is 
in principle invariant under time reversal, reversing the time direc- 
tion would turn the decaying mode of perturbation growth into a 
growing mode, which will rapidly increase and amplify any un- 
certainties in the data or even slight numerical errors until they 
eventually dominate the solution. With such a procedure, the prob- 
ability of recovering the highly ordered state of homogeneous and 
almost uniform initial conditions will be infinitely small. Addi- 
tionally, such a time reversal could not be carried out in a con- 
sistent way, because shell crossing and dissipational processes ef- 
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fectively erase information ab out the initial state of the Universe 
dCrocce & Scocci marro 200^). Yet, it should be mentioned that for- 
ward mode ling algorithms should be able to overcome these prob- 
lems (e.g. lKitaura et alj20ld : ljasche & WandeltH2012h . 



2.5 The Reverse Zeldovich Approximation 

In the remainder of this paper, we will denote the peculiar velocity v 
and the physical position r as the special case z = of the comoving 
velocity u{z) and the comoving position x(z), respectively. In the 
following, we consider again the Zeldovich approximation. Then, 
at z = 0, the positions r and peculiar velocities v should be given 



2.4 Lagrangian reconstruction from peculiar velocities 

Previous simulations conducted in the CLUES framework 
dGottlober et al.l2oToh directly used observational peculiar velocity 
data as constraints for the CR algorithm to obtain the constrained 
initial conditions. The main drawback of this method is that it com- 
pletely neglects the fact that the data does not trace the initial con- 
ditions at some high initial redshift z illit , where all resolved scales 
lie in the linear regime of density perturbations, but instead the 
non-linearly evolved field at z = 0. Therefore, the accuracy of the 
method is limited to large scales above some critical scale where 
the difference between the initial and the final field becomes non- 
negligible. 

For the initial vs. evolved peculiar velocity field, this critical 
scale is around « 10 Mpc/h. It is an interesting fact that this crit- 
ical scale is much lower than the one for the initial vs. evolved 
density field, where at z = non-linearity, or in other words non- 
Gaussianity, is noticeable at much larger scales. Figure Q~]shows the 
peculiar velocity field component-wise at z illit = 30 (left) and at 
z = (right) for a cosmological simulation. Both look remarkably 
similar. The evolved field is shifted by the cosmological displace- 
ment field if/, which varies locally and has an average amplitude of 
» 10 Mpc//?, and there is a non-linear enhancement of the velocity 
amplitude at even smaller scales due to the gravitational collapse of 
structure and another non-linear component due to virial motions. 
However, on scales above those of gravitationally bound objects 
(i.e. a few Mpc//i) the velocity field v(r) is very close to the Gaus- 
sian distribution exp ected by linear theory dSheth & Diaferiol200ll ; 
lHamana et a"i]|2003l) . Because of these properties, the peculiar ve- 
locity approach works reasonably well for the large-scale structure 
even without any "time machine" method at all, as seen in previous 
CLUES simulations. The scale down to which this works will be 
determined by the scale of ifi. On the other hand, any attempt to 
construct a constrained realization from tracers of the density field 
must include some Gaussia nization procedure, which can be Eule - 
rian (such as the methods of Wei nberg! 1992tlNusser & Dekell 19921) 
or also include a Lagrangian reconstruction (such as the already 
mentioned action-minimizing methods). But avoiding the Gaus- 
sianization problem by using peculiar velocities still leaves us with 
a significant error due to the displacement field. This means that the 
constraints we use for Eqs. ( 1111 and ( 1121 are displaced by ifi rela- 
tive to the true initial conditions. As a result, the cluster positions 
in the evolved constrained simulations will be off by » 10 Mpc//? 
on average (and more in regions where the displacement field am- 
plitude is higher) com pared to the observed configuration at z = 
dGottlober etal.l2010l) . 

It is clear that, despite the favourable properties of peculiar 
velocities, in order to increase the quality of our initial conditions 
and constrained simulations down to scales lower than the ampli- 
tude of if/ and to remove the significant systematic position errors 
caused by if/, we need a Lagrangian reconstruction scheme that can 
be applied directly to the peculiar velocity data. In this work, we 
develop such a method, the basic idea of which will be described in 
the next section. 



r = q + if/(q) 
»(r) = H fif/(q) 



(18) 
(19) 



with a = Ho at z = 0. This is, of course, a very crude assumption: by 
z = the approximation will, in general, break down in overdense 
regions due to shell crossing and nonlinear gravitational interac- 
tion. However, we assume its validity for the time being. Thinking 
in the other direction, this enables us to obtain an estimate of the 
displacement field if/ and the Lagrangian position q, if the velocity v 
is known. By design, we are not interested in recovering the full dis- 
placement field if/{r) everywhere in the computational box; rather, 
it is enough to recover it at the positions of the available discrete 
data points used as the input data c, . We can then use these discrete 
displacement field values and their reconstructed Lagrangian posi- 
tions to constrain initial conditions, utilising the CR algorithm. This 
approach works well with the first-order Zeldovich approximation, 
since it is completely local and can be performed at discrete loca- 
tions, and with sparse and inhomogeneously sampled input data, 
whereas higher-order extensions depend on integrals over the com- 
putational volume. 

Let us consider a single position r where we have given a dis- 
crete value of the peculiar velocity field v(r) at z = (we will deal 
later with observational errors and the fact that only the radial part 
v r (r) is actually available). In what we call the Reverse Zeldovich 
Approximation (RZA), we can now simply reverse Eqs. ( 1181 and 
d!9l > and estimate the displacement of this data point as 



(4^ = — 

Hof 

and the initial position of the halo by 



v 

HoJ ' 



(20) 



(21) 



where df = Hof at z = 0, and we approximated xf^ A ~ ? RZA , since 
Zinh will be chosen sufficiently high so that there x ~ q. Both x^ A 
and i/r RZA could then be used to place a constraint for generating 
initial conditions at Zm\t (see Section [5}. This reconstructed initial 
position will be, in general, at some distance rf RZA from the actual 
initial position, which we define to be the "RZA error" d RZA for this 
data point, 



H f 



(22) 



where x inlt is the actual true position of this data point in the initial 
conditions at Zmit- It is interesting to study how well this relatively 
simple approach performs in practice. The RZA error rf RZA pro- 
vides a simple one-dimensional quantity useful to quantify the scale 
length down to which the RZA is valid for different data points and 
environments where they are located. In the next section, we do so 
using a cosmological simulation, where both the initial conditions 
and the distribution at z = are known. 

In Sections [4] and [5] we will then use the reconstructed if/ RZA 
and jc rza to obtain a new set of constraints for constraining initial 

init c 

conditions with the CR algorithm, which are more suitable than the 
original c, due to the Lagrangian reconstruction. 
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Figure 1. Slice through the BOX160 simulation with boxsize 160 Mpc//i, slice thickness 10 Mpc//i at the initial redshift z = 30 (left) and the final redshift 
Z = (right). Shown are projections of the two components of the peculiar velocity field that lie in the depicted plane, v x (top) and «„ (bottom). The left panels 
were generated directly from the initial conditions computed on a grid; right panels were generated by a binning of the evolved particle velocities to a regular 
256 3 grid with a TSC kernel. The z=30 velocity field (left) is scaled to z=0 by the linear theory. 



We have to note here that there have been similar attempts 
to estimate ifr by reversing the Zeldovich approximation, but from 
the density field at z = 0, using the linear-th eory assumption 
6 = -V ■ ilr jEisenstein et al.l2007l : lNoh et al.l2009i) or higher-order 
lpt jFalck et alj|2012h . The quality of these reconstructions is un- 
fortunately far too poor to obtain a set of constraints usable for 
constrained simulations, because of the high non-linearity of 6 at 
z = 0. However, we find that due to their nature, the application 
of this idea to peculiar velocities, i.e. our RZA method, performs 
substantially better. 



3 VALIDITY OF THE RZA AT Z = 
3.1 The test simulation 

For this study, we use the BOX160 simulation performed within the 
CLUES project. This is a constrained simulation of the Local Uni- 
verse with a boxsize of 160 Mpc/A, set up with the WMAP3 cos- 
mological parameters^] Q, m = 0.24, fl A = 0.76, and cr 8 = 0.75 and a 
matching ACDM power spectrum. The simulation contains a large- 
scale structure resembling the observed Local Universe, with ob- 
jects corresponding to the Virgo, Coma, Perseus-Pisces, and Hydra- 
Norma-Centaurus (Great Attractor) clusters, and a Local Group 



2 Although these parameters are outdated from today's perspective, we 
have found in other (non-constrained) test simulations that switching to a 
more recent set of parameters like WMAP7 does not influence the results 
obtained from RZA reconstruction. 
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candid ate. Details of the simulation are discussed in lCuesta et al.l 
J201 1). Theoretically, investigating the validity of the RZA method 
could be done on any (non-constrained) cosmological simulation. 
However, we choose a constrained CLUES simulation on purpose, 
so that the gained insight can easier be applied to the observed Lo- 
cal Universe. 

By employing a cosmological simulation as the model uni- 
verse, the relationship between peculiar velocities and cosmologi- 
cal initial conditions can be studied directly, because we have com- 
plete access to both the initial conditions and the distribution at 
z = 0. We use peculiar velocities of dark matter haloes at z = 
as a proxy for observable galaxy peculiar velocities. In the current 
theoretical picture of galaxy formation, all galaxies reside inside 
the potential wells of dark matter haloes. It is therefore a reason- 
able assumption that the observed galaxy peculiar velocities fol- 
low the peculiar velocities of their surrounding dark matter haloes. 
These are in turn directly accessible in the simulation snapshots. 
The BOX 160 is a collisionless-matter-only simulation. This pre- 
vents us from a proper modeling of the bias induced by the fact that 
galaxies are used to sample the velocity field. None of the current 
galaxy formation simulations has a dynamical range large enough 
that enables the resolution of sub-galactic scales with large cosmo- 
logical boxes. Hence, one needs to resort to DM-only simulations 
for modelling the galaxy distribution. 

We extract a halo catalogue from the BOX160 simulation us- 

I II — — 71 

ing AHF (Amiga's halo finder; Knollmann & Knebe 2009) at the 

z = snapshot of the evolved simulation. The peculiar velocities of 
each halo are extracted by AHF and defined as the average veloc- 
ity vector of all dark matter particles within a halo's virial radius 
R V1I . Only haloes of mass log(M/M Q ) > 11.5 from the simula- 
tion are considered; we want to discard haloes that are either too 
poorly resolved to obtain reasonable estimates on their peculiar ve- 
locity, or too small to host galaxies that would be observable in a 
galaxy distance survey. In the following, we investigate how well 
the displacement field if/ with respect to the simulation's initial con- 
ditions can be recovered from these halo velocities. Observational 
data features only the radial component of the peculiar velocities, 
and suffers from significant sparseness, incompleteness, a very lim- 
ited data volume and observational errors. We will investigate the 
impact of these limitations later in Section[4] 

3.2 RZA reconstruction 

Figure [2] illustrates a typical medium-size halo with virial mass 
M = 7.9 x 10 12 M o from the BOX160 simulation. The positions 
of particles within the virial radius R vn that have been identified by 
AHF are shown with black dots. The halo has a position r, which 
is defined by AHF as the position of the most bound particle, and a 
velocity v, which is the mean velocity of all of the halo's particles. 
In the initial conditions at zm = 30, the same particles, identified 
by their IDs, occupy the positions marked by the blue dots. Most 
of them form a coherent patch in space, the protohalc^. This patch 

3 Note that a small fraction of the particles at Zj n i t is not connected to 
the protohalo patch, i.e. the Lagrangian volume is disjoint. Those particles 
mostly ended up in the virialised halo after accretion or merging processes 
and subsequent relaxation that happened much later than z m it during the 
non-linear structure formation process. We will not attempt to track this 
process for each individual halo and choose to not treat those particles sep- 
arately, although they may affect the estimation of x m i t and iji. We also add 
that while the halo in Figure ff] is the most common case, there are some 
more extreme cases with more disjoint Lagrangian regions; the correspond- 
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Figure 2. RZA on a simulated dark matter halo identified at z. = with 
virial mass M = 1.9 X 10 12 M G , virial radius R vjl = 410 kpc//j. Black dots: 
positions of all particles inside R„,> at z = 0, with mean velocity v and the 
halo centre at position r. Blue dots: positions of the same particles in the ini- 
tial conditions at z\m\ = 30, with centre-of-mass at x m i t ("initial position"). 



ifr (blue arrow): actual displacement, with 



8.36 Mpc//i. i/r^ (red 



arrow): RZA reconstructed displacement. x^ t A : RZA reconstructed initial 
position. ii" 2 * (purple): the RZA error. 



covers an overdense region in the initial conditions that will later 
collapse to form the halo. If we neglect the tiny initial displace- 
ments of the particles ifr init at the initial conditions by approximat- 
ing jc init « q for each particle, then the volume V init occupied by 
the protohalo corresponds to the Langrangian volume of the halo. 
This volume can be relatively big, measuring about 10 Mpc//j in 
diameter for this halo mass, or even more than 20 Mpc//i for a mas- 
sive cluster. Since the initial density distribution is almost uniform, 
this volume depends directly on the mass via Vj n i, = M 3 /g, where 
g = 3H 2 Cl m /S7rG is the mean cosmic density. We then define the 
"initial position" of the halo x- m n as the centre of mass of all its 
particles at initial redshift z init . We further define the displacement 
of the halo as tfr = r - x in]t . For the following, it is important to 
remember that those are now quantities averaged over all particles 
that belong to the halo, and not values of the continuous fields at 
discrete points in space. 

We assume that each of the selected haloes in the simulation 
hosts a galaxy with an observable peculiar velocity v. We continue 
to use the assumption that the peculiar velocity of any observed 
galaxy follows the mean peculiar velocity of its surrounding dark 
matter halo host. Then we can directly use the velocities v(r) of the 
haloes in a simulation as a simple model for observational data. For 
each halo we now apply the RZA, i.e. Eqs. J20b and \2U , to produce 
our estimate of the displacement field, iff RZA , and the initial position 
of the halo *j^ A . In the simulation, we can identify all particles of 
all haloes by their ID and find them in the initial conditions at zmu 
so that the true values i/r and jc init are also known for each halo. With 
this we also determine the values of the RZA error d RZA (Eq. l22t 
for each halo. 
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Figure 3. Absolute displacement \if>\ vs. absolute peculiar velocity \v\ (top 
row) and the angle between these two vectors (bottom row) at z = for 
main haloes, i.e. haloes after all their subhaloes (if any) have been grouped 
together with the main object. 



3.3 The RZA displacement field 

The Zeldovich approximation breaks down on small scales where 
shell crossing occurs, marking the transition to the non-linear phase 
of structure formation. We expect that the RZA is not valid at all 
for haloes gravitationally bound to more massive haloes, i.e. orbit- 
ing, infalling, and merging substructure, since the magnitude and 
direction of their velocities at z = have been significantly altered 
by those processes. So in order to obtain a good estimate of the 
displacement field, those objects should be discarded. Physically, 
this would mean to detect and remove haloes that are gravitation- 
ally bound to more massive host objects, as well as ongoing ma- 
jor mergers and possibly other scenarios. This is a highly compli- 
cated task in itself, so we use a simplified scheme instead, which 
does not consider the underlying physics but works well for our 
pu rpose. Rather than properly fi nding actual subhaloes (such as in 



ur po 

e.g. lKnollmann & Knebj200Sl ). we simply detect haloes that share 
at least one dark matter particle with another halo and therefore 
presumably interact in some way or another, without investigating 
the nature of this interaction. We then call a halo a "subhalo", if it 
shares at least one dark matter particle with another halo more mas- 
sive than itself. Conversely, we call a halo a "main halo" if it shares 
only particles with less massive haloes or with no other haloes at all. 
By this definition, the sphere described by the virial radius R V u of a 



ing haloes at z. = are mostly the res ult of violent major merger events. For 
a thorough study of protohaloes, see Ludlow & Porciani (201l]). 
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Figure 4. Displacement error with RZA reconstruction (d RZA , blue) and 
without RZA reconstruction (\ifi\, red), for all main haloes with mass M > 
10 1L5 M o inside BOX160, depending on the the absolute halo velocity (top) 
and the underlying local density (bottom). For each bin, the point is placed 
at the median, and the bar shows the interval between the 25 th and 75 lh 
percentile. Additionally, the number of haloes in each bin is given. Note the 
different scales for the d axis. 



"main halo" will never overlap with the R VK sphere of a more mas- 
sive halo. In this way, we divide all the identified haloes in the sim- 
ulation into two groups, subhaloes and main haloes, and keep only 
main haloes for the RZA analysis. This simple and rather conserva- 
tive scheme is very effective in filtering out virial motions from the 
peculiar velocity data, since from clusters or other gravitationally 
bound systems only the main object will be retainedj. 

Figure [3] shows the absolute halo displacement over the 
halo velocity |u| and the angle between those vectors at z = for 
the main haloes in BOX160. Subhaloes contain little to no informa- 
tion about their cosmological displacement from their momentary 
velocities at z = 0. On the other hand, for the main haloes, the 
approximation holds reasonably well. In the low-velocity regime, 
which mostly corresponds to the low-overdensity regime, the re- 
lation is satisfied nearly perfectly; further up the plots reproduce 



4 It is currently not yet clear how to construct a similarly effective group- 
ing method for data points in observational galaxy peculiar velocity sur- 
veys. This problem is a s ubject of ongoing research in the CLUES project 
ICourtois & Tully|2012j) . 
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the well-known tendency of the Zeldovich approximation to under- 
estimate the total velocities, since it neglects the additional grav- 
itational acceleration and deflection created by the dynamically 
changing density distribution. Nevertheless, the direction of dis- 
placement is in general conserved very well in the velocity vec- 
tor: for the majority of identified haloes the angle between the two, 
a = acos [(v ■ if/) / (|»| ■ |^|)], lies below « 10°, and for 95% of the 
main haloes, it lies below » 30°. This could be further reduced 
by a more conservative "grouping" algorithm. We found that if we 
discard all haloes that are closer than 2.5 Mpc//i to a more mas- 
sive halo, then for half of the haloes the angle lies below « 7°, 
and for 95% of them below « 15°. After conducting this study, 
we learned that a simi lar investigation was already carried out by 
IShefh & Diaferiol feOOll) . who compared the peculiar velocities of 
haloes at z - with their initial velocities u in the linear regime, 
instead of their displacement ifi as we did here. They likewise found 
that halo peculiar velocities at z = retain the information from the 
initial conditions very well, with deviations in the angle of motion 
of typically only 10°, if one thoroughly filters out virial motions. 
Our results are in very good agreement with theirs. 

The remainder of this analysis concentrates on the main 
haloes, since we now established that subhaloes should not be con- 
sidered for RZA. This leaves us with a total of 29122 objects with 
log(M/M Q ) > 11.5 within the BOX160. We focus on the error cF^ 
of the RZA on the initial halo position guess (equationl22b. a prac- 
tical quantity to estimate the validity of the approximation. Figure 
[4] shows d RZA (blue) compared to the error on the initial positions 
that we would make without any Lagrangian reconstruction (red). 
In the latter case the error would be simply the displacement \if/\ 
itself. 

The analysis reveals that a majority of the main haloes have 
a surprisingly low d RZA : the median is at 1.36 Mpc//i, the mean 
at 2.3 Mpc/h. This is well below the scale on which the first-order 
Zeldovich approximation is normally considered to be valid. Above 
all, it is a significant improvement over the "0th order" linear theory 
approximation that the overdensity peaks traced by the haloes do 
not move at all, leading to a mean error of (\if/\) = 8.7 Mpc//i for this 
particular simulation. The distribution of d RZA is highly skewed: for 
most of the haloes, rf RZA is within a few Mpc//7, but at the same time 
a small fraction of objects has a very high d KLA . Because of this 
skewness, the median and the upper and lower quartiles are shown, 
being more meaningful than the mean and the lcr interval. 

As expected, the success of RZA depends highly on the un- 
derlying overdensity. In higher-density regions, the non-linear en- 
hancement of peculiar velocities is stronger, and shell-crossing oc- 
curs earlier, so that the Zeldovich approximation is not an optimal 
description of the dynamics there. Although we discarded the sub- 
haloes, still some shell crossing will occur for the main haloes in 
the overdense regions. The dependence of d KLA on the total velocity 
is also strong because high velocities are associated with dense en- 
vironments. A couple hundred objects even have d RZA > \lfi\, mean- 
ing that RZA completely fails there. They stick out in the bottom 
panel of Figure [4] as the three last bins with the highest density. 
All of those outliers are relatively low-mass objects travelling at 
high velocities in the immediate vicinity of one of the most mas- 
sive clusters in the box and thus experiencing significant non-linear 
contributions to their peculiar velocities. They can be removed by 
a scheme that is more rigorous than our shared-particles approach, 
such as increasing the minimum allowed distance to the next more 
massive haloes. In order to catch those extreme outliers, is thus 
sufficient to apply such a more rigorous scheme to the most dense 
environments only. In observational data, this effectively means re- 



ducing rich galaxy clusters to a single data point, while keeping 
field galaxies ungrouped. If we would however apply such a more 
rigorous scheme to the whole data set, the retained objects would 
have a significantly lower rf RZA in average, but at the same time 
we would remove a substantial fraction of data points with useful 
information, which is undesirable. 

We also found that there is only a very weak dependence 
of d RZA on the actual mass of haloes. Furthermore, the dis- 
placement \ifi\ shows no significant correlation with the mass 
either. This is consistent with the fact that the halo peculiar 
velocities u themselves are a lmost independent of halo mass 
(Suhhonenko & Gramann 2003). A simulation with larger boxsize 
would provide better statistics on the most massive objects, where 
a slight effect of smaller v can be seen, but they are not a focus of 
this work. 

From this theoretical study on identified haloes in a cosmo- 
logical simulation it is clear that the RZA can provide a reasonable 
estimate of the cosmological displacement and the initial position 
for most of the objects. The primary interest is now how well this 
scheme can be applied to more realistic observational data. This is 
discussed in the following section. 



4 RECONSTRUCTION FROM RADIAL PECULIAR 
VELOCITY DATA 

In this section we study how well the RZA can be applied to a re- 
alistic observational data set. For this, we apply the RZA to mock 
data, drawn from the same BOX160 simulation as before, but fea- 
turing a limited data volume, observational errors, and knowledge 
of only the radial part v r of the peculiar velocity. In particular, 
we want t o mimic two observational datasets: th e Cosmicfiows-1 
catalogue jTullv et alj|2009l ; ICourtois et ai]|2012h . which is avail- 
able through the Extragalactic Distance Databas^f] and features 
1797 galaxy distances and peculiar velocities in 742 groups within 
3000 k m /s, and the upcoming Co s micflows-2 catalogue j Courtois 
l2011al lbl; ICourtois & Tully||2012bl : iTullv & Courtoisll2012l) . which 
is expected to contain 7000 galaxy distances and peculiar veloci- 
ties and extend out to 6000 km/s. This way we can directly estimate 
what quality of constrained simulations we can expect if we apply 
the RZA to these data sets, and by how much the reconstruction 
quality would improve with the upcoming new dataset. 

4.1 Building the mock catalogues 

In order to construct mock catalogues, we first have to choose a 
mock observer. We choose a galaxy group identified in BOX160, 
which consists of three main haloes with virial masses of 4.9, 6.0 
and 6.7 x 10" M a /h; we choose the middle one, calling it the simu- 
lated "Milky Way". The position of this halo, fMw, marks the fixed 
position of the mock observer. The halo is at a distance of 17 Mpc/h 
to the simulated BOX 160 "Virgo", the next massive cluster. This 
distance is somewhat larger than the actual distance fr om the Milky 
Way to the centre of the Virgo Cluster (16 Mpc; see Fouq ue et al.l 
1200 lh : however, this is not an issue here. To construct a mock cat- 
alogue, we cast a sphere with a fixed radius R mwl around r MW , and 
consider only haloes within this sphere - the "observational vol- 
ume". We choose a default value of R max = 30 Mpc/h, which mim- 
ics a redshift cut at 3000 km/s, similar to the Cosmicflows- 1 cata- 

5 accessible online at edd.ifa.hawaii.edu 
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logue. We also used a larger volume of /? max = 60 Mpc//z, which 
resembles in quality the upcoming Cosmicflows-2 data. Since r 14 * 
is located near the centre of the 160 Mpc//i simulation box, even 
at ^max = 60 Mpc/A we are sufficiently far away from the box 
edge, so we will not suffer from the effects of the periodic bound- 
ary conditions. Our standard choice of the sphere within 30 Mpc//7 
of the chosen MW candidate comprises 2.8% of the total BOX160 
volume and places the BOX160 "Local Supercluster" containing 
"Virgo" well inside the datazone. The larger 60 Mpc/7z sphere also 
encompasses the BOX160 "Great Attractor" and touches the simu- 
lated "Coma" and "Perseus-Pisces" clusters at its edge. 

The R miix = 30 Mpc//i data volume leads to somewhat different 
conditions than considering the whole box, as we did in Section[3] 
In this data volume, there is a significant net displacement of about 
4 Mpc//z. Also, the mock volume as a whole is overdense com- 
pared to the box average, and further non-linearily evolved than the 
average field at z = 0. This is similar to the overdensity of the ob- 
served Local Universe r egion aroun d the LSS, which is well estab- 
lished by observations dTullv et al1l2008l) . While the main haloes 
in BOX160 have a median displacement \tfr\ of 8.7 Mpc//z and a 
median RZA error d RZA of 1.36 Mpc//i, for the 1243 main haloes 
inside the mock volume the median \iff\ is at 11.7 Mpc/7z and the 
median d RZA at 2.8 Mpc/fr. The net overdensity also means that 
there is a net inflow into the observational volume. It is interesting 
to see how the reconstruction will handle such a difficult case. In 
the Cosmicflows-1 data, the value of Ho is chosen suc h that there 
is no net inflow/outflow w ith respect to the data zone dTullv et all 
2008; Cou rtois et ai]|2012l) . This restriction may not be required in 
general. For the mock data, we keep the value of h = 0.73 as given 
by the simulation parameters. 

Observationally, radial peculiar velocities uf and their errors 
are obtained from the measured galaxy distance r, some estimate 
of the absolute distance error Ar, and the observed redshift v° bs via 

sj" = v f s - r ■ Ho , (23) 
AvT = -Ar ■ H . (24) 

To mimic these errors, we first take the known radial velocity of a 
halo from the AHF catalogue, l) AHF , and the known distance r AHF 
to the mock observer. In the Cosmicfiows- 1 catalogue, the different 
points come from different types of me asurements with diff erently 
distributed errors between 7 and 20 % dCourtois et al.l l2012): here, 
we simplify the situation by assuming relative distance errors Sr 
that are Gaussian distributed with a constant ((5r) mls . If we choose a 
fixed value for this rms accuracy of the distance, then the absolute 
mock distance error is generated via 

Ar mock = G(0, 1) ■ (dr) mK ■ r AHF , (25) 

where G(0,1) is a random number drawn from a Gaussian distribu- 
tion with mean and variance 1 . Then, the radial velocity with the 
added error for the mock catalogue is computed as follows: 

^mock = ^AHF + Au mook > (26) 
AjJ mock = _ Ar m00k . Ho _ (2 7) 

This data forms the WF/CR input constraints, where i = I, ... N 
goes over all haloes in the mock catalogue. 

In observational data, v° bs is observed in the rest frame of the 
observer and is then transformed to the rest frame of the CMB 
dipole. This is preferred cosmological frame of reference within 
which the primordial cosmological perturbations are analyzed and 
cosmological simulations are conducted. 

For the mocks, we achieve a similar setup by not considering 



the peculiar motion of the simulated MW halo where we placed the 
mock observer. We rather take directly the velocities in the fixed 
rest frame of the simulation box as computed by AHF. 

Our "standard" mock catalogue is the C30_10, which we con- 
sider a "typical" sparse peculiar velocity dataset. We take the proce- 
dure of considering only main haloes as a proxy for the "grouping" 
performed on observational data. The C30.10 contains all main 
haloes above a mass cut M m i„ = 10 1L9 M G //i within R mslx = 30 
Mpc//i, yielding 588 radial velocity datapoints. This choice gives 
the C30_10 similar properties to the grouped Cosmicflows-1 cat- 
alogue, but somewhat more sparse^]. For the "observational" rms 
distance error (<Sr) rras of this mock catalogue we choose a value of 
10%. This is similar to the observational data: while the median rms 
distance error is somewhat higher at 13% on the individual galaxies 
in Cosmicflows-1, this error reduces when the galaxies are arranged 
in groups. The C30.10 is interesting because even if the data are 
improving in terms of the number of individual galaxy distances, 
the number of galaxy groups in a radius of 30 Mpc//? is probably 
not going to vary by much, nor is the accuracy on the most nearby 
distances. Note that a rms distance error of (<5r) mls = 10% leads 
to a relatively high error on the peculiar velocities: at a distance of 
r = 30 Mpc/fe, this corresponds to an rms error of 300 km/s on the 
peculiar velocity, which is approximately equal to the variance cr 
of the peculiar velocity values. 

In order to also have a less sparse sample, we also use a lower 
mass cut at M mm = 10" 5 M Q /h; the corresponding mock is la- 
belled E30_10 and contains 1243 radial velocities within R mlix = 30 
Mpc//j. We also use a mock with the larger volume of R mdx = 60 
Mpc/A, labelled E60_10, mimicking the upcoming Cosmicflows-2 
data. This mock features 7637 peculiar velocities. Both of these 
mocks also feature the same kind of distance errors with ((5r),. ras = 
10%. 



4.2 Application of the RZA to mock data 

In order to apply RZA reconstruction of the displacement field 
from the mock catalogues, we first need to estimate the three- 
dimensional peculiar velocity vector v from its given radial com- 
ponent v, while simultaneously accounting for the uncertainties of 
the v, values due to the mock observational errors. For this, we use 
the Wiener Filter (WF) reconstruction method introduced in 12.11 
The radial peculiar velocities in the mock catalogues are very well 
approximated by a Gaussian distrib ution, therefore the WF is a very 
good estimator dZaroubi et al As the prior model we use the 

same WMAP3 power spectrum that was used to set up the BOX 160 
simulation. It is one of the strengths of the WF method that it can 
not only filter out noise and extrapolate/interpolate the field into 
unsampled regions from sparse and noisy datasets, but it can also 
recover any linear functional of the overdensity 5 (here, the three- 
dimensional velocity field v) from a sampling of any other linear 
functional (here, the radial velocity component v r ). 

Using the radial velocity data points from the mock catalogue 
as constraints c,, we evaluate Eq.[2]in order to reconstruct all three 
cartesian components of v. Note that we have to evaluate Eq. 10 
only at the positions r, of the mock data points, which significantly 



Choosing this sample was motivated by the fact that at the time we com- 
menced this study, we had a preliminary version of the Cosmicfiows- 1 avail- 
able that contained exactly the same number of 588 galaxy groups within 30 
Mpc/ft. The current version of the Cosmicflows-1 catalogue is less sparse 
with 742 galaxy groups within the same volume. 



Reverse Zeldovich Approximation 1 1 



0.15 



0.05 



- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1— 

RZA from exact 3D vpec — e — 
RZA from E60 10 a 
RZA from C30_10 — v— 

no RZA - --e- - 




20 



d RZA [Mpc/h] 



Figure 5. Distribution function of the RZA displacement error d RZA for 
main haloes within 30 Mpc//i from the BOX 160 mock observer for RZA 
reconstructions from the exact 3D halo peculiar velocities (solid), the two 
radial peculiar velocity mocks E60.10 (dotted) and C30.10 (dashed), and 
no reconstruction at all, i.e. d = (dot-dashed). The symbols are placed at 
the median of each distribution. Each pdf was convolved with a 0.5 Mpc//i 
Gaussian kernel to obtain a smooth curve. 



reduces the computational cost compared to a reconstruction of the 
entire field. We can then continue and apply the RZA. The only dif- 
ference from the approach in Section[3]is that, since the true values 
of v are unknown, we now take the WF estimate u WF instead. This 
technique is straightforward to apply also to a real observational 
dataset. 

We perform the RZA reconstruction on the mock datasets by 
evaluating Eqs. i20l and d21b to recover (fr RZA and x^j A for each 
object in the mock dataset, only this time using the u WF estimate for 
v. Since we have the initial conditions of the simulation available, 
we can check the results and also evaluate the RZA error cP 2 ^ via 
Eq. J22t for the different mock catalogues to estimate the accuracy 
of the reconstruction. 
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Figure 6. The distribution of the three Cartesian components of the resid- 
ual ip RZA - if/ for the main haloes within 30 Mpc/ft from the BOX160 mock 
observer for RZA reconstructions from the exact 3D halo peculiar veloci- 
ties (solid) and the two radial peculiar velocity mocks E60.10 (dotted) and 
C30.10 (dashed). The red curves show the normal distributions fitted for 
the actual distributions. 



4.3 Reconstructed displacements 

Figure [5] shows the RZA displacement error d RZA distribution of 
the RZA reconstruction obtained for the two mocks E60_10 and 
C30.10 with the WF technique described above. Both mocks fea- 
ture only the radial component of the halo peculiar velocity and ob- 
servational errors of Sr = 10%. The RZA displacement error d RZA 
resulting from this procedure is compared to the rf RZA in the case 
if we know the full three-dimensional peculiar velocity vectors of 
the haloes with no error, like in the analysis in Section|3] and to the 
case of no Lagrangian reconstruction (then, the error is simply 
the amplitude of the displacement itself). The rf RZA values consid- 
ered for Figure |5]are only those of the haloes inside the mock data 
volume, i.e. a sphere of 30 Mpc/h radius around the mock observer 
at tmw- This volume is an overdense region compared to the whole 
computational box of the BOX160 simulation. Therefore, the mean 
displacements and peculiar velocities are higher. In this mock vol- 
ume, the median displacement field is 11.7 Mpc//i, which is also 
the median error of the estimated initial positions if one does not 
use Lagrangian reconstruction. With perfect knowledge of the 3D 
halo peculiar velocities v, RZA reconstruction reduces this error 



to d RZA = 2.8 Mpc//i. For the mocks, which contain only the ra- 
dial component v r of v and are sparse and noisy, the WF + RZA 
method produces a median d RZA of typically around 5 Mpc/h (5.30 
Mpc//i for the C30_10 mock and 4.64 Mpc/h for the better-quality 
E60.10 mock). This is still a significant improvement over the pre- 
vious method not using Lagrangian reconstruction. The increased 
datapoint density and data volume in the E60_10 mock leads to only 
a small improvement in reconstruction quality within 30 Mpc/h of 
/■mw- This confirms that the Wiener filter is capable of producing an 
already good estimate of the true underlying field from very sparse, 
noisy, and incomplete data. 

Figure [6] shows the distribution of the error in the RZA dis- 
placement for the three Cartesian components for three mock cat- 
alogs. The catalogs consist of the case of exact 3D halo peculiar 
velocities and the two radial peculiar velocity mocks, E60.10 and 
C30.10. For all cases and for the three different components the 
distribution of the residual is well fitted by a normal distribution. 
The anisotropic distribution of the data points of the mock catalogs 
leads to an anisotropic distribution of the residuals. For the particu- 
lar constrained BOX160 simulation, from which the mock catalogs 
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are drawn, the distributions of the residuals in the X and Z direc- 
tions are very similar, and both are distinct from the distribution in 
the Y direction. 

Figure [7] shows the actual displacement field if/ and the re- 
constructed if/ RZA for the E30_10 mock in a 15 Mpc/h thick slice 
through the mock volume. The comparison reveals that although 
the input data consisted only of the radial components of the haloes' 
peculiar velocity vectors, the Wiener filter + RZA method man- 
ages to recover a good estimate (right panel of Figure |7) of the 
true three-dimensional displacement field (left panel of Figure [TJ. 
The RZA error d RZA values for the data points of this mock are 
color-coded in the right panel. The reconstruction is very accurate 
{cF^ < 2 Mpc/7;) in the less dense regions of the mock outside 
the very dense "supercluster" structure above the centre. Inside the 
supercluster however, the original displacements are generally not 
recovered by this procedure. This can be explained because, even 
after the grouping procedure, peculiar velocities inside such dense 
environments are dominated by non-linear effects. In order to pre- 
vent the occurence of high d RZA values above 5 Mpc//j, one would 
have to enforce a much stronger grouping inside such dense clus- 
ters. 



5 CONSTRAINING INITIAL CONDITIONS WITH RZA 

In the last section we established that using a realistic peculiar ve- 
locity dataset, the RZA can give good estimates of the underlying 
cosmological displacement field and the initial position of the dat- 
apoints. The main goal remains, however, to use such data to con- 
strain cosmological initial conditions and to run constrained simu- 
lations. We therefore have to fit the RZA into the CR method (Sec- 
tion [Z2} that we use to generate constrained ICs. The scheme has 
to be designed around the basic requirement that the input for the 
CR algorithm that produces Gaussian ICs has to consist of a set of 
constraints T = {cj. 

5.1 Method 

The RZA allows us to improve upon the previously used method, 
which consisted of taking the radial peculiar velocity values v r j at 
their observed positions r, at z = and using them as constraints 
c, for the ICs at z inil . (In the following, we refer to this as "Method 
I".) One could think of now taking as constraints the reconstructed 
displacement fields if/ RZA at the reconstructed initial positio ns jc^f A 
instead . Essentially the same approach was employed by lLavauxl 
(2010) to construct constrained initial conditions, except that the re- 
constructed displacement fields and initial positions resulted from 
a MAK reconstruction procedure applied to the 2MASS redshift 
survey. However, in our case this approach will not work. The dis- 
placement field ift RZA was obtained with the WF/CR algorithm, and 
using its results as constraints for a CR would mean to apply the 
WF/CR algorithm a second time, i.e. iteratively to the same data 
set. This results in an unacceptable loss of power due to the conser- 
vative nature of the WF, and the additional information gained by 
performing a Lagrangian reconstruction would be smoothed away 
by the filter. It is a general rule that applying the WF iteratively 
does not lead to meaningful results. 

To overcome this obstacle, we tried different modifications of 
this idea. The optimal strategy that we found consists of displacing 
the radial velocity datapoints v r from the (mock) catalogue from 
their observed position r at z = "back in time" to the RZA- 
reconstructed initial position ^JJf t A , while leaving all other proper- 



ties attached to the datapoints unchanged (the direction of the con- 
strained component of v, its amplitude, and the associated observa- 
tional error). In the following, we refer to this as "Method II". (A 
ready-to-use implementation of this shifting procedure is included 
in the ICeCoRe code.) We then use this shifted set of constraints 
as input to generate a new CR. In this case, we are not applying 
the WF/CR iteratively, since we use the first (WF) run only to shift 
the datapoint positions, before we use them for the second (CR) 
run. We therefore change only the positions r, of the original con- 
straints c, to obtain the new constraints cf ZA that are actually used 
for generating ICs. 

Note that for the new constraints cf ZA , the constrained compo- 
nent is no longer the radial component with respect to the observer 
position tmw. because the position of the constraint was shifted 
from its observed position r to its estimated initial position *j^ A . 
For this, we used our new ICeCoRe codfl It allows to constrain 
arbitrarily directed components v n (along any unit vector e^) of the 
peculiar velocity vector v for applying the WF/CR operator in Eq. 
i ll It . which is straightforward to implement using Eqs. (|5) and ([6). 
This however adds the requirement to explicitly specify for each 
velocity constraint the unit vector e p of the component to be con- 
strained. In our case, the unit vectors of the constrained peculiar 
velocity components will correspond to e r for the observable ra- 
dial direction v r with respect to the observer at z = 0, however the 
positions x of the constraints are not those at z = 0, but displaced 
to the RZA-estimated initial position x RZA according to Eq. ( 12 It . 
so that e r is not parallel to J^f t A . 

In practice, the steps for obtaining constrained ICs for /V-body 
simulations using the RZA method are as follows: 

(i) Apply a grouping to the input radial peculiar velocity data to 
"linearize" it and to filter out virial motions. The resulting set of 
radial peculiar velocities v rJ at positions r, defines the set of con- 
straints c h 

(ii) Run a WF reconstruction on the c, by evaluating Eq. (TJ at 
the positions r, This will yield an estimate of the three-dimensional 
velocities u, at r,. 

(iii) Apply the RZA by evaluating [2JJ to yield an estimate of the 
initial positions *j^ A . 

(iv) Build a new set of constraints cf ZA by shifting the positions 
of the c, back in time from r, to x^f A . 

(v) Run the CR algorithm, using the constraints cf ZA , to obtain 
a realization of constrained initial conditions, scale it to the desired 
redshift twit, an d run a constrained /V-body simulation. 

The implementation details of how to set up particle initial con- 
ditions from the 6 CR field are not different from conventional cos- 
molog ical simulations, see e.g. lPrunet et al.lj2008l) : lGottlober et al.l 

fcoioh . 

With this scheme, it is possible to generate significantly more 
accurate constrained simulations from radial peculiar velocity cat- 
alogues than with the previous method without Lagrangian recon- 
struction. In paper III of this series we will investigate such con- 
strained simulations in detail. 



7 Our previously used numerical implementation of the CR algorithm was 
restricted to constraining the component that was the radial direction at its 
position (i.e. parallel to the position vector w.r.t. the observer). ICeCoRe 
removes this restriction and allows a much more flexible placement of dif- 
ferent types of constraints. 
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Figure 7. Actual BOX160 displacement field (left) vs. RZA reconstruction from radial peculiar velocities (using mock E30.10, with mock observational 
distance errors of 10%). The dots show the z = position r of the haloes contained in the mock data; the lines show the displacement vector iji from the initial 
conditions. In the reconstruction, the RZA error d RZA is highlighted by colour-coding of the points. The contours show the underlying dark matter density 
(smoothed with a 1.5 Mpc/h Gaussian). Shown is a 15 Mpc//i thick slice containing the simulated BOX160 Local Supercluster with Virgo at X=82, Y=74. 
The red cross marks the position of the mock observer. 



6 SUMMARY, CONCLUSIONS AND OUTLOOK 

This work is the first in a series of papers investigating the prob- 
lem of generating initial conditions for constrained simulations of 
the Local Universe using galaxy peculiar velocities. Our main mo- 
tivation is to extend the previously used Constrained Realizations 
(CR) method to include a Lagrangian reconstruction scheme in or- 
der to improve the quality of the resulting constrained simulations. 
In this paper, we propose a method to reconstruct the cosmological 
displacement field from observable peculiar velocities. We find that 
for this task, the relatively simple Zeldovich approximation (ZA) is 
a very powerful approach when applied directly to peculiar veloc- 
ities, instead of applying it to a sampling of the density field as it 
is commonly done in various other contexts. Instead of using the 
ZA to approximate the cosmic structure formation forward in time, 
which was its original motivation, we use it in the reverse time di- 
rection to estimate the cosmological initial conditions at high z in 
the linear regime from the peculiar velocities at z = 0. We call this 
approach the Reverse Zeldovich Approximation (RZA). 

As the main finding of this work we conclude that, for the 
task of generating constrained realisations of the Local Universe 
from peculiar velocity data, a Lagrangian reconstruction scheme 
such as the RZA reconstruction presented here provides a signifi- 
cant improvement over treating the peculiar velocities with linear 
theory, which was the previous approach. We tested this method 
with realistic radial peculiar velocity mock catalogues drawn from 
a cosmological reference simulation (which itself is a constrained 
simulation of the Local Universe). For a reconstruction of the three- 
dimensional galaxy peculiar velocities from the radial components 
in the mock catalogues, we used the well-established Wiener fil- 
ter method. We confirm that it performs very well in compensating 
for the radial component limitation and the observational errors. 
We found that directly applying the WF/CR operator to peculiar 
velocity data at z = is actually not a very good estimate of the 



initial conditions due to the displacements, the non-linearities of 
the data, and the relatively large systematic errors. However, the 
Wiener filter reconstructed 3D peculiar velocity field is a reason- 
ably accurate estimate of the displacement field if/, with a typical 
error of a few Mpc//j. This allows us to generate a better estimate 
of the initial positions of the data points' progenitors. We find that 
for halo peculiar velocities, the RZA is able to recover the correct 
initial positions with a median error of only 1.36 Mpc/h, where we 
use the BOX160 simulation as a reference universe. For the realis- 
tic mock catalogues drawn from this simulation (featuring various 
observational errors) this median increases to 5 /r'Mpc. This is 
a significant improvement over the previous approach of neglect- 
ing the displacement field, which introduces errors on a scale of 
10 /i^'Mpc or even higher. 

In paper II of this series, we will investigate in detail how 
much the individual observational errors and limitations affect the 
result of the RZA reconstruction, in order to derive what obser- 
vational data sets would be ideally suited for a reconstruction of 
the ICs of our Local Universe. We can already make a first predic- 
tion based on the work presented here. Distance catalogues con- 
taining spiral galaxies, such as Tully-Fisher data, may be a better 
choice than data covering early-type galaxies such as the funda- 
mental plane and surface brightness fluctuations methods. Spiral 
galaxies provide a more homogeneous mapping of the sky and are 
less biased towards high-density regions where non-linear effects 
become stronger. These regions are exactly where the RZA recon- 
struction method fails (the displacement error becomes very high 
due to non-linearities). It will also be important to optimize the 
data grouping method for filtering out virial motions. Our results 
suggest that the best strategy is to reduce each group of data points 
that form a virialised or otherwise strongly gravitational ly interact- 
ing structure into a single data point, in order to remove non-linear 
virial motions completely. Testing the performance of different in- 
put data in this context will be the subject of further studies. 
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The displacement and initial position estimate generated by 
the RZA reconstruction can be used to constrain initial conditions 
(ICs) for constrained cosmological simulations. These constrained 
ICs can be set up in a second WF/CR operator step, using a new set 
of constraints that is obtained by shifting the original datapoints to 
their RZA-estimated initial positions. Constrained simulations can 
be run by feeding the result of that procedure as the ICs for an N- 
body simulation code. In paper III of this series, we will analyze 
in detail the accuracy and limitations of such constrained simula- 
tions and quantify the increase in reconstruction quality that can be 
obtained by using RZA. 

We are also exploring the extension of the RZA reconstruc- 
tion to higher-order Lagrangian perturbation theory. However, any 
method of higher order than the first-order Zeldovich approxima- 
tion will break the locality of the approximation, and the whole 
field has to be considered instead of the given discrete data points. 
This will invariably introduce additional systematic errors. It is a 
very difficult task to overcome this obstacle if one is presented 
with very sparse and inhomogeneously sampled input data. With 
the first-order RZA, we therefore may already have found a near- 
optimal method of reconstructing initial conditions from observa- 
tional galaxy radial peculiar velocity data. 
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